Superstore Data Analysis.ipynb
- Superstore Sales Analysis
- Project Overview
- Objective
- Dataset Info
- **Key Questions to Answer**
- Importing Libraries & Loading Data
- Data Cleaning & Preparation
- Sales and Profit Over Time
- Insights
- Business Implication
- Insights
- Business Implication
- Sales and Profit by Category & Sub-Category
- Insights
- Business Implication
- Sales versus profit
- Insights
- Business Recommendations:
- Which states and regions generate the highest and lowest profit?
- 📈 Profit Analysis by State and Region
- 🔍 Insights
- ✅ High-Profit Areas
- ❌ Low/Negative-Profit Areas
- 💡 Business Recommendations
- 1. 🚀 Focus on High-Profit States
- 2. 🔎 Investigate Loss-Making States
- 3. 📍 Implement Regional Strategies
- 4. ⚙️ Optimize Operations in Central and South
- 5. 📊 Invest in Data-Driven Tools
- Which customer segments are the most valuable?
- ❓ Question: Which Customer Segments Are the Most Valuable?
- ✅ Answer:
- 🥇 Most Valuable Segment:
- 🥈 Second Most Valuable:
- 🥉 Least Valuable:
- 📌 Business Implication:
- Top 10 Customers by Sales
- 💼 Business Recommendations for Top 10 Customers by Sales
- 🎯 Objective:
- 🔝 Recommendations:
- 📈 Resulting Benefits:
- Sales by Region and Segment
- Sales by Region and Segment
- 🔍 Insights:
- ✅ Business Recommendations:
- Summary:
- File
- Edit
- View
- Run
- Kernel
- Settings
- Help
Kernel status: Idle
## Superstore Sales Analysis
Superstore Sales Analysis¶
## Project Overview
This project analyzes the **Superstore** dataset to uncover sales patterns, profit distribution, customer behaviour and regional performance.
Using Python and Plotly Express, I created interactive visualizations to identify key business insights and suggest data-driven recommendations.
Project Overview¶
This project analyzes the Superstore dataset to uncover sales patterns, profit distribution, customer behaviour and regional performance. Using Python and Plotly Express, I created interactive visualizations to identify key business insights and suggest data-driven recommendations.
## Objective
To explore the Superstore dataset and answer critical business questions to help improve decision-making in areas of sales, profit, shipping and customer segmentation.
Objective¶
To explore the Superstore dataset and answer critical business questions to help improve decision-making in areas of sales, profit, shipping and customer segmentation.
## Dataset Info
- Source: Sample Superstore dataset from Kaggle
- Fields: Order Date, Ship Date, Sales, Profit, Category,Product Name, Customer Name, Segment etc.
Dataset Info¶
- Source: Sample Superstore dataset from Kaggle
- Fields: Order Date, Ship Date, Sales, Profit, Category,Product Name, Customer Name, Segment etc.
## **Key Questions to Answer**
1. What are the overall sales and profit trends over time?
2. Which product categories and sub-categories are the most/least profitable?
3. Which states and regions generate the highest and lowest profit?
4. Which customer segments are the most valuable?
5. Top 10 Customers by Sales
6. Sales by Region and Segment
Key Questions to Answer¶
What are the overall sales and profit trends over time?
Which product categories and sub-categories are the most/least profitable?
Which states and regions generate the highest and lowest profit?
Which customer segments are the most valuable?
Top 10 Customers by Sales
Sales by Region and Segment
## Importing Libraries & Loading Data
Importing Libraries & Loading Data¶
[3]:
import pandas as pd
import plotly.express as px
# load dataset
df = pd.read_csv("C:/Users/DELL/Downloads/superstore_dataset.csv")
#checking the first few rows
df.head()
[3]:
| order_id | order_date | ship_date | customer | manufactory | product_name | segment | category | subcategory | region | zip | city | state | country | discount | profit | quantity | sales | profit_margin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | US-2020-103800 | 1/3/2019 | 1/7/2019 | Darren Powers | Message Book | Message Book, Wirebound, Four 5 1/2" X 4" Form... | Consumer | Office Supplies | Paper | Central | 77095 | Houston | Texas | United States | 0.2 | 5.5512 | 2 | 16.448 | 0.3375 |
| 1 | US-2020-112326 | 1/4/2019 | 1/8/2019 | Phillina Ober | GBC | GBC Standard Plastic Binding Systems Combs | Home Office | Office Supplies | Binders | Central | 60540 | Naperville | Illinois | United States | 0.8 | -5.4870 | 2 | 3.540 | -1.5500 |
| 2 | US-2020-112326 | 1/4/2019 | 1/8/2019 | Phillina Ober | Avery | Avery 508 | Home Office | Office Supplies | Labels | Central | 60540 | Naperville | Illinois | United States | 0.2 | 4.2717 | 3 | 11.784 | 0.3625 |
| 3 | US-2020-112326 | 1/4/2019 | 1/8/2019 | Phillina Ober | SAFCO | SAFCO Boltless Steel Shelving | Home Office | Office Supplies | Storage | Central | 60540 | Naperville | Illinois | United States | 0.2 | -64.7748 | 3 | 272.736 | -0.2375 |
| 4 | US-2020-141817 | 1/5/2019 | 1/12/2019 | Mick Brown | Avery | Avery Hi-Liter EverBold Pen Style Fluorescent ... | Consumer | Office Supplies | Art | East | 19143 | Philadelphia | Pennsylvania | United States | 0.2 | 4.8840 | 3 | 19.536 | 0.2500 |
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9994 entries, 0 to 9993 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 9994 non-null object 1 order_date 9994 non-null object 2 ship_date 9994 non-null object 3 customer 9994 non-null object 4 manufactory 9994 non-null object 5 product_name 9994 non-null object 6 segment 9994 non-null object 7 category 9994 non-null object 8 subcategory 9994 non-null object 9 region 9994 non-null object 10 zip 9994 non-null int64 11 city 9994 non-null object 12 state 9994 non-null object 13 country 9994 non-null object 14 discount 9994 non-null float64 15 profit 9994 non-null float64 16 quantity 9994 non-null int64 17 sales 9994 non-null float64 18 profit_margin 9994 non-null float64 dtypes: float64(4), int64(2), object(13) memory usage: 1.4+ MB
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9994 entries, 0 to 9993 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 9994 non-null object 1 order_date 9994 non-null datetime64[ns] 2 ship_date 9994 non-null datetime64[ns] 3 customer 9994 non-null object 4 manufactory 9994 non-null object 5 product_name 9994 non-null object 6 segment 9994 non-null object 7 category 9994 non-null object 8 subcategory 9994 non-null object 9 region 9994 non-null object 10 zip 9994 non-null int64 11 city 9994 non-null object 12 state 9994 non-null object 13 country 9994 non-null object 14 discount 9994 non-null float64 15 profit 9994 non-null float64 16 quantity 9994 non-null int64 17 sales 9994 non-null float64 18 profit_margin 9994 non-null float64 19 Delivery Time 9994 non-null int64 dtypes: datetime64[ns](2), float64(4), int64(3), object(11) memory usage: 1.5+ MB
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[9], line 10 6 df.isnull().sum() 8 #dropping the Order Date and Ship Date column ---> 10 df = df.drop(['Order Date', 'Ship Date'], axis = 1) File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:5581, in DataFrame.drop(self, labels, axis, index, columns, level, inplace, errors) 5433 def drop( 5434 self, 5435 labels: IndexLabel | None = None, (...) 5442 errors: IgnoreRaise = "raise", 5443 ) -> DataFrame | None: 5444 """ 5445 Drop specified labels from rows or columns. 5446 (...) 5579 weight 1.0 0.8 5580 """ -> 5581 return super().drop( 5582 labels=labels, 5583 axis=axis, 5584 index=index, 5585 columns=columns, 5586 level=level, 5587 inplace=inplace, 5588 errors=errors, 5589 ) File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:4788, in NDFrame.drop(self, labels, axis, index, columns, level, inplace, errors) 4786 for axis, labels in axes.items(): 4787 if labels is not None: -> 4788 obj = obj._drop_axis(labels, axis, level=level, errors=errors) 4790 if inplace: 4791 self._update_inplace(obj) File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:4830, in NDFrame._drop_axis(self, labels, axis, level, errors, only_slice) 4828 new_axis = axis.drop(labels, level=level, errors=errors) 4829 else: -> 4830 new_axis = axis.drop(labels, errors=errors) 4831 indexer = axis.get_indexer(new_axis) 4833 # Case for non-unique axis 4834 else: File ~\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:7070, in Index.drop(self, labels, errors) 7068 if mask.any(): 7069 if errors != "ignore": -> 7070 raise KeyError(f"{labels[mask].tolist()} not found in axis") 7071 indexer = indexer[~mask] 7072 return self.delete(indexer) KeyError: "['Order Date', 'Ship Date'] not found in axis"
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9994 entries, 0 to 9993 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 9994 non-null object 1 order_date 9994 non-null datetime64[ns] 2 ship_date 9994 non-null datetime64[ns] 3 customer 9994 non-null object 4 manufactory 9994 non-null object 5 product_name 9994 non-null object 6 segment 9994 non-null object 7 category 9994 non-null object 8 subcategory 9994 non-null object 9 region 9994 non-null object 10 zip 9994 non-null int64 11 city 9994 non-null object 12 state 9994 non-null object 13 country 9994 non-null object 14 discount 9994 non-null float64 15 profit 9994 non-null float64 16 quantity 9994 non-null int64 17 sales 9994 non-null float64 18 profit_margin 9994 non-null float64 19 Delivery Time 9994 non-null int64 dtypes: datetime64[ns](2), float64(4), int64(3), object(11) memory usage: 1.5+ MB
Top 10 Customers by Sales:
customer sales
0 Sean Miller 25043.050
1 Tamara Chand 19052.218
2 Raymond Buch 15117.339
3 Tom Ashbrook 14595.620
4 Adrian Barton 14473.571
5 Ken Lonsdale 14175.229
6 Sanjit Chand 14142.334
7 Hunter Lopez 12873.298
8 Sanjit Engle 12209.438
9 Christopher Conant 12129.072
Common Tools
No metadata.
Advanced Tools
No metadata.
Anaconda Assistant
AI-powered coding, insights and debugging in your notebooks.
To enable the following extensions, create an account or sign in.
- Anaconda Assistant4.0.15
- Coming soon!
- Data Catalogs
- Panel Deployments
- Sharing
Already have an account? Sign In
For more information, read our Anaconda Assistant documentation.
